Search CORE

156 research outputs found

Online scheduling with partial job values: Does timesharing or randomization help?

Author: Chin FYL
Fung SPY
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2003
Field of study

We study the following online preemptive scheduling problem: given a set of jobs with release times, deadlines, processing times and weights, schedule them so as to maximize the total value obtained. Unlike traditional scheduling problems, partially completed jobs can get partial values proportional to their amounts processed. Recently Chrobak et al. gave improved lower and upper bounds [1.236, 1.8] on the competitive ratio for this problem, the upper bound being achieved by using timesharing to simulate two equal-speed processors. In this paper we (1) give a new algorithm MIXED-κ with competitive ratio 1/(1 - (κ/(κ + 1))κ) which approaches e/(e-1) ≈ 1.582 when κ → ∞, by using timesharing to simulate κ equal-speed processors; (2) give an equivalent but much more practical algorithm MIX, which is e/(e - 1)-competitive (independent of κ), by timesharing the processor with different speeds (depending on the job weights), and use its interesting properties to devise an efficient implementation; (3) improve the lower bound to 1.25 by showing an identical lower bound for randomized algorithms; and (4) prove a lower bound of 1.618 on the competitive ratio when timesharing is not allowed, thus answering an open problem raised by Chang and Yap, showing that timesharing provably helps in giving better algorithms for this problem.postprin

HKU Scholars Hub

An efficient algorithm for the extended (l,d)-motif problem with unknown number of binding sites

Author: Chin FYL
Leung HCM
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2005
Field of study

Finding common patterns, or motifs, from a set of DNA sequences is an important problem in molecular biology. Most motif-discovering algorithms/software require the length of the motif as input. Motivated by the fact that the motif's length is usually unknown in practice, Styczynsfd et al. introduced the Extended (l,d)-Motif Problem (EMP), where the motif's length is not an input parameter. Unfortunately, the algorithm given by Styczynski et al. to solve EMP can take an unacceptably long time to run, e.g. over 3 months to discover a length-14 motif. This paper makes two main contributions. First, we eliminate another input parameter from EMP: the minimum number of binding sites in the DNA sequences. Fewer input parameters not only reduces the burden of the user, but also may give more realistic/robust results since restrictions on length or on the number of binding sites make little sense when the best motif may not be the longest nor have the largest number of binding sites. Second, we develop an efficient algorithm to solve our redefined problem. The algorithm is also a fast solution for EMP (without any sacrifice to accuracy) making EMP practical. © 2005 IEEE.published_or_final_versio

HKU Scholars Hub

Finding exact optimal motifs in matrix representation by partitioning

Author: Chin FYL
Leung HCM
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2005
Field of study

Motivation: Finding common patterns, or motifs, in the promoter regions of co-expressed genes is an important problem in bioinformatics. A common representation of the motif is by probability matrix or PSSM (position specific scoring matrix). However, even for a motif of length six or seven, there is no algorithm that can guarantee finding the exact optimal matrix from an infinite number of possible matrices. Results: T his paper introduces the first algorithm, called EOMM, for finding the exact optimal matrix-represented motif, or simply optimal motif. Based on branch-and-bound searching by partitioning the solution space recursively, EOMM can find the optimal motif of size up to eight or nine, and a motif of larger size with any desired accuracy on the principle that the smaller the error bound, the longer the running time. Experiments show that for some real and simulated data sets, EOMM finds the motif despite very weak signals when existing software, such as MEME and MITRA-PSSM, fails to do so. © The Author 2005. Published by Oxford University Press. All rights reserved.postprin

HKU Scholars Hub

Online pricing for multi-type of Items

Author: Chin FYL
Ting HF
Zhang Y
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

LNCS v. 7285 entitled: Frontiers in algorithmics and algorithmic aspects in information and management: joint international conference, FAW-AAIM 2012 ... proceedingsIn this paper, we study the problem of online pricing for bundles of items. Given a seller with k types of items, m of each, a sequence of users {u 1, u 2, ...} arrives one by one. Each user is single-minded, i.e., each user is interested only in a particular bundle of items. The seller must set the price and assign some amount of bundles to each user upon his/her arrival. Bundles can be sold fractionally. Each u i has his/her value function v i (·) such that v i (x) is the highest unit price u i is willing to pay for x bundles. The objective is to maximize the revenue of the seller by setting the price and amount of bundles for each user. In this paper, we first show that the lower bound of the competitive ratio for this problem is Ω(logh + logk), where h is the highest unit price to be paid among all users. We then give a deterministic online algorithm, Pricing, whose competitive ratio is O (√k·log h log k). When k = 1 the lower and upper bounds asymptotically match the optimal result O(logh). © 2012 Springer-Verlag.postprin

HKU Scholars Hub

Automated Hierarchical Image Segmentation Based on Merging of Quadrilaterals

Author: Chen Z
Chin FYL
Chung HY
Publication venue: WSEAS.
Publication date: 01/01/2006
Field of study

Proceedings of the 6th WSEAS International Conference on Signal Processing, Computational Geometry & Artifical Vision, 2006, p. 135-140This paper proposes a quadrilateral-based and automated hierarchical segmentation method, in which quadrilaterals are first constructed from an edge map, where neighboring quadrilaterals with similar features of interest are then merged together in a hierarchical mode to form regions. When evaluated qualitatively and quantitatively, the proposed method outperforms three traditional and commonly-used techniques, namely, K-means clustering, seeded region growing and quadrilateral-based segmentation. It is shown by experimental results that our proposed method is robust in both recovering missed important regions while preventing unnecessary over-segmentation, and offers an efficient description of the segmented objects conducive to content-based applications.postprintThe 6th WSEAS International Conference on Signal Processing, Computational Geometry & Artificial Vision (ISCGAV'06), Crete, Greece, August 2006. in Conference Proceedings, 2006, p. 135-14

HKU Scholars Hub

Approximation for minimum triangulation of convex polyhedra

Author: Chin FYL
Fung SPY
Wang CA
Publication venue: Society for Industrial and Applied Mathematics.
Publication date: 01/01/2001
Field of study

The minimum triangulation of a convex polyhedron is a triangulation that contains the minimum number of tetrahedra over all its possible triangulations. Since finding the minimum triangulation of convex polyhedra was recently shown to be NP-hard, it becomes significant to find algorithms that give good approximation. In this paper, we give a new triangulation algorithm with an improved approximation ratio 2 - &OHgr;(l/√). We also show that this is best possible for algorithms that only consider the combinatorial structure of the polyhedra. Copyright © 2009 ACM, Inc.published_or_final_versio

HKU Scholars Hub

Non-adaptive complex group testing with multiple positive sets

Author: Chin FYL
Leung HCM
Yiu SM
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

LNCS v. 6648 is conference proceedings of TAMC 2011Given n items with at most d of them having a particular property (referred as positive items), a single test on a selected subset of them is positive if the subset contains any positive item. The non-adaptive group testing problem is to design how to group the items to minimize the number of tests required to identify all positive items in which all tests are performed in parallel. This problem is well-studied and algorithms exist that match the lower bound with a small gap of logd asymptoticically. An important generalization of the problem is to consider the case that individual positive item cannot make a test positive, but a combination of them (referred as positive subsets) can do. The problem is referred as the non-adaptive complex group testing. Assume there are at most d positive subsets whose sizes are at most s, existing algorithms either require Ω(logs n) tests for general n or O((s+d/d) log n) tests for some special values of n . However, the number of items in each test cannot be very small or very large in real situation. The above algorithms cannot be applied because there is no control on the number of items in each test. In this paper, we provide a novel and practical derandomized algorithm to construct the tests, which has two important properties. (1) Our algorithm requires only O((d+s)d+s+1/(ddss log n) tests for all positive integers n which matches the upper bound on the number of tests when all positive subsets are singletons, i.e. s = 1. (2) All tests in our algorithm can have the same number of tested items k. Thus, our algorithm can solve the problem with additional constraints on the number of tested items in each test, such as maximum or minimum number of tested items. © 2011 Springer-Verlag.postprintThe 8th Annual Conference on Theory and Applications of Models of Computation (TAMC 2011), Tokyo, Japan, 23-25 May 2011. In Lecture Notes in Computer Science, 2011, v. 6648, p. 172-18

HKU Scholars Hub

T-IDBA: A de novo Iterative de Bruijn Graph Assembler for Transcriptome

Author: Chin FYL
Leung HCM
Peng Y
Yiu SM
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

LNCS v. 6577 entitled: Research in computational molecular biology: 15th annual international conference, RECOMB 2011 ... : proceedingsRNA-seq data produced by next-generation sequencing technology is a useful tool for analyzing transcriptomes. However, existing de novo transcriptome assemblers do not fully utilize the properties of transcriptomes and may result in short contigs because of the splicing nature (shared exons) of the genes. We propose the T-IDBA algorithm to reconstruct expressed isoforms without reference genome. By using pair-end information to solve the problem of long repeats in different genes and branching in the same gene due to alternative splicing, the graph can be decomposed into small components, each corresponds to a gene. The most possible isoforms with sufficient support from the pair-end reads will be found heuristically. In practice, our de novo transcriptome assembler, T-IDBA, outperforms Abyss substantially in terms of sensitivity and precision for both simulated and real data. T-IDBA is available at http://www.cs.hku.hk/~alse/ tidba/. © 2011 Springer-Verlag.postprin

HKU Scholars Hub

Multimedia object placement for hybrid transparent data replication

Author: Huang L
Chin FYL
Shen H
Li K
Publication venue: IEEE. The Journal's web site is located at http://ieeexplore.ieee.org/xpl/conhome.jsp?punumber=1000308
Publication date: 01/01/2000
Field of study

In this paper, we address present an optimal solution for the problem of multimedia object placement for hybrid transparent data replication. The performance objective is to minimize the total access cost by considering both transmission cost and transcoding cost. The performance of the proposed solution is evaluated with a set of carefully designed simulation experiments for various performance metrics over a wide range of system parameters. The simulation results show that our solution consistently and significantly outperforms comparison solutions in terms of all the performance metrics considered. © 2005 IEEE.published_or_final_versio

Crossref

Secretaría de Estado de Cultura

HKU Scholars Hub

MetaCluster-TA: taxonomic annotation for metagenomic data based on assembly-assisted binning

Author: Chin FYL
Leung HCM
Wang Y
Yiu SM
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

This article is part of the supplement: Selected articles from the Twelfth Asia Pacific Bioinformatics Conference (APBC 2014): GenomicsBackground Taxonomic annotation of reads is an important problem in metagenomic analysis. Existing annotation tools, which rely on the approach of aligning each read to the taxonomic structure, are unable to annotate many reads efficiently and accurately as reads (100 bp) are short and most of them come from unknown genomes. Previous work has suggested assembling the reads to make longer contigs before annotation. More reads/contigs can be annotated as a longer contig (in Kbp) can be aligned to a taxon even if it is from an unknown species as long as it contains a conserved region of that taxon. Unfortunately existing metagenomic assembly tools are not mature enough to produce long enough contigs. Binning tries to group reads/contigs of similar species together. Intuitively, reads in the same group (cluster) should be annotated to the same taxon and these reads altogether should cover a significant portion of the genome alleviating the problem of short contigs if the quality of binning is high. However, no existing work has tried to use binning results to help solve the annotation problem. This work explores this direction. Results In this paper, we describe MetaCluster-TA, an assembly-assisted binning-based annotation tool which relies on an innovative idea of annotating binned reads instead of aligning each read or contig to the taxonomic structure separately. We propose the novel concept of the 'virtual contig' (which can be up to 10 Kb in length) to represent a set of reads and then represent each cluster as a set of 'virtual contigs' (which together can be total up to 1 Mb in length) for annotation. MetaCluster-TA can outperform widely-used MEGAN4 and can annotate (1) more reads since the virtual contigs are much longer; (2) more accurately since each cluster of long virtual contigs contains global information of the sampled genome which tends to be more accurate than short reads or assembled contigs which contain only local information of the genome; and (3) more efficiently since there are much fewer long virtual contigs to align than short reads. MetaCluster-TA outperforms MetaCluster 5.0 as a binning tool since binning itself can be more sensitive and precise given long virtual contigs and the binning results can be improved using the reference taxonomic database. Conclusions MetaCluster-TA can outperform widely-used MEGAN4 and can annotate more reads with higher accuracy and higher efficiency. It also outperforms MetaCluster 5.0 as a binning tool.published_or_final_versio

Springer - Publisher Connector

HKU Scholars Hub